Creating a ranking of the 100 greatest musical artists of all time is an ambicious task. The subjective analysis of quality and impact of an artist over history is so obvious that even ChatGPT aknowledges the highly individual conotation that any ranking inherently presents. Nevertheless, that is what the Rolling Stones Magazine attempted in 2010 with the publication of the “100 Greatest Musical Artists of All Time”.
In that context, this research endeavors to explore the enduring impact of these artists and their music at the conclusion of the year 2023. Using data from Spotify, the central focus of this study revolves, first in stablishing if the ranked Rolling Stones’ artists are still popular and relevant at the end of 2023 and within that scope, the study aims to decipher the factors that might contribute to the continued engagement with the music of these celebrated artists.
The Analysis will then be divided in three steps: 1 - Gathering data from Spotify to analyse how popular the ranked artists are at the end of 2023 2 - Comparing the audio features of the more popular artists ranked and compare them to see how they relate to the less popular ranked artists 3 - Comparing how the audio features of these artists correlate to the current most popular artists on the global Charts, according to the Global Weekly Spotify Charts
The ranking of “100 Greatest Musical Artists of All Time” was collected on the Rolling Stones website and was used as the starting point of the analysis.
The data analysis, on the other hand, was centered on information provided by the Spotify API available information. From this source, information on artists’ and tracks’ popularity, features and other information was directly retrieved. The API was accessed through the spotifyr package.
We start this analysis by first looking at the popularity index provided by the Spotify API. According to the documentation, this variable is based on the artist’s tracks popularity and the index varies between 0 and 100, with 100 being the most popular.
It is important to also notice that the popularity of each track in spotify also varies between 0 and 100, with 100 being the most popular. However, its value is based, in the most part, on the total number of plays the track has had and how recent those plays are. Although we cannot assess the specific weight that is being given to the recentness of the plays of a specific track, this measure is aligned with the main goal of this study, that is to identify the artist’s relevance at the present moment in time, due to the fact that, according to the documentation, “songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past”.
Another limitation of the use of the variable is that it is not updated in real time, and may lag actual opularity by a few days, but since the exact moment of the analysis is not that relevant in the context of this study, this does not harm the usage of this variable.
As for the index having a range of 100, and assuming that the artists’ popularity is normally distributed on Spotify, it is fair to state that all artists with a popularity over 50 could be considered “Popular” and those with a popularity lower than that threshold could be seen as “Unpopular”.
On the interactive plot bellow, each dot is an artist of the Rolling Stones Ranking. The artist’s Spotify popularity can be addressed on the y axis and the colors display wether the Popularity is bigger than 50 or not. The red line shows the regression line of the two variables.
assignment_plot <- arranging_data %>%
ggplot() +
geom_point(aes(x = ranking, y = popularity, color = is_popular, text = text),size = 3, alpha = 0.5) +
geom_line(stat = "smooth", method = lm, aes(x = ranking, y = popularity),formula = 'y ~ x', color = "red", alpha = 0.5) +
theme_minimal() +
labs(title = "Rolling Stones Ranking x Spotify Popularity",
color = NULL,
x = "Rolling Stones Ranking",
y = "Spotify Popularity") +
ylim(0,100) +
xlim(0,100) +
geom_vline(xintercept = 0, color = "black", alpha = 0.1) + # Add vertical line at x = 0
geom_hline(yintercept = 0, color = "black", alpha = 0.1) + # Add horizontal line at y = 0
theme(legend.position = "bottom",
panel.grid = element_line(color = alpha("gray", 0.2), linetype = "dashed"),# Adjust legend position
axis.text = element_text(size = 8), # Adjust axis text size
axis.title = element_text(size = 10),
plot.title = element_text(hjust = 0.5)) # Adjust axis title size
pp_assignment_plot <- ggplotly(assignment_plot, tooltip = "text")
#Teste
pp_assignment_plot
It is clear that the vast majority of the artists on the Ranking can be classified as Popular, and the red regression line also displays the negative correlation between the ranking and the popularity, which is expected since those variables are of opposite direction, meaning that a better ranking presents smaller values and a better popularity is present in bigger values. However, the small inclinement of the regression line also suggests that there is no correlation between the Ranking and the Spotify Popularity, which means that being in a better ranking position on the Rolling Stones Magazine doe not translate to a bigger popularity in 2023.
WIth that in mind, we can already stablish that the tracks of most of the artists on the Rolling Stones ranking has, indeed, endured over time, making them still popular in 2023. Now we will try to identify what differentiates the more popular artists of the Ranking from the lesser populars ones.
From the plot above and the statistical summary of these artists’ popularity, it is possible to notice that the distribution is not heavily skewed, given that the median is close to the mean, the data is farily diversed distributed, without any big outliers that can be spotted based on the visual inspection of the plot.
However, the data shows a big difference from the minimum (33) to the maximum (90) popularity, which suggests that there might be differences among the artists that could potentially provide bigger or smaller popularities
# Print a message
cat("Statistical Summary of Spotify Popularity\n\n")
## Statistical Summary of Spotify Popularity
# Print the summary of the 'popularity' variable
print(summary(artist_info_ranking$popularity))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 33.00 55.00 66.00 65.29 75.00 90.00
The distribution of popularity among the artists is preatty normally distributed
artist_info_ranking <- readRDS("artist_info_ranking.rds")
top_tracks_us <- readRDS("top_tracks_us.rds")
top_tracks_gb <- readRDS("top_tracks_gb.rds")
audio_features_us <- readRDS("audio_features_us.rds")
audio_features_gb <- readRDS("audio_features_us.rds")
#Defining a popular artist as anyone with more than 50 in popularity
artist_info_ranking$is_popular <- ifelse(artist_info_ranking$popularity > 50, "Popular", "Not Popular")
#Defining the relative popularity
artist_info_ranking$relative_pop <- ifelse(artist_info_ranking$popularity <= mean(artist_info_ranking$popularity), "Not Popular", "Popular")
#subsetting the columns that are going to be needed
subset_artist <- artist_info_ranking %>% select(id, artist_spotify, popularity, relative_pop, genres)
subset_top_tracks <- top_tracks_us %>% select(artist_id, track_popularity, track_id, track_name, album.release_date)
merged_data <- merge(subset_artist, subset_top_tracks, by.x = "id", by.y = "artist_id")
#Arranging the data in order of Artist Popularity for plotting
merged_data <- merged_data %>%
group_by(artist_spotify) %>%
arrange(popularity, mean_track_popularity = mean(track_popularity))
#Creating the mean(track_popularity) data for plotting
mean_data <- merged_data %>%
group_by(artist_spotify) %>%
summarise(mean_track_popularity = mean(track_popularity))
# Merge mean data with the original data
merged_data <- merge(merged_data, mean_data, by = "artist_spotify")
#Creating the text for the interactive plot
#merged_data$text <- paste0("Artist: ", artist_spotify,"\n Track: ", track_name )
#Factoring the artist_spotify for plotting in order of artist popularity
#popularity_order <- artist_info_ranking %>% select(artist_spotify, popularity) %>% arrange(popularity)
popularity_order <- artist_info_ranking %>% select(artist_spotify, popularity) %>% arrange(desc(popularity))
merged_data$artist_spotify <- factor(merged_data$artist_spotify, levels = popularity_order$artist_spotify)
class(merged_data$artist_spotify)
## [1] "factor"
p3 <- ggplot(merged_data, aes(y = artist_spotify)) +
geom_point(aes(x = track_popularity, color = "Track Popularity"), size = 1.5, alpha = 0.7) +
geom_point(aes(x = popularity, color = "Artist Popularity"), size = 1.5, alpha = 0.7) +
geom_point(aes(x = mean_track_popularity, color = "Average Track Popularity"), size = 1.5, alpha = 0.7) +
theme_minimal() +
labs(title = "Artist and Top Tracks (US) Popularity",
y = NULL,
x = "Popularity",
subtitle = "Artists are in Ascending Popularity Order") +
scale_color_manual(
values = c("#E63946","#FBAF4F","#BFD3C1"),
name = NULL
) +
theme(axis.text.x = element_text(hjust = 1, vjust = 1),
panel.grid = element_line(color = alpha("gray", 0.2), linetype = "solid"),
panel.grid.major.x = element_line(linetype = "dashed"),
panel.grid.minor.x = element_line(linetype = "dashed"),
axis.text.y = element_text(hjust = 1, vjust = 0.5),
plot.title = element_text(hjust = 0.5, size = 16),
plot.subtitle = element_text(hjust = 0.5, size = 11),
legend.position = "top") +
geom_vline(xintercept = 0, color = "black", alpha = 0.2) + # Add vertical line at x = 0
scale_x_continuous(breaks = c(0,25,50,75,100),
expand = c(0,0),
limits = c(-1,101),
sec.axis = sec_axis(~., name = "Popularity"))
p3
It is possible to notice that, for every artist on the Rolling Stones Ranking, the Spotify Popularity is bigger than the Top Tracks Popularity of the artist. However, it is also clear that if an artist’s top tracks has a bigger Popularity, then, on average, the artist will also present a bigger popularity too, although the average popularity of the top tracks is never greater than the general Artist Popularity.
An interesting aspect that gan also be highlighted is, when analyzing the tracks’ popularity is the presence of outliers, specially among the less popular artists. Using the Interquartile Range to identify possible outliers, the table below shows that the average popularity of the artists that have a positive outlier, is below the Average Popularity of all the artists that are part of the Rolling Stones Ranking. This suggests that the most popular artists of today are not the ones with one time hits, but the ones who are able to have multiple musical hits.
#Assign tracks that have positive outliers using IQR
merged_data_outlier <- merged_data %>%
group_by(artist_spotify) %>%
mutate(
Q1 = quantile(track_popularity, 0.25),
Q3 = quantile(track_popularity, 0.75),
IQR_value = Q3 - Q1,
upper_bound = Q3 + 1.5 * IQR_value,
outliers = ifelse(track_popularity > upper_bound, "Positive Outlier", "Not Outlier"),
mean_popularity = mean(track_popularity),
diff_to_mean = track_popularity - mean_popularity
) %>%
ungroup()
outlier_summary <- merged_data_outlier %>%
filter(outliers == "Positive Outlier") %>%
summarise("Total Positive Outliers" = n(),
"Unique Artists With Positive Outliers" = length(unique(artist_spotify)),
"Mean Artist Popularity" = mean(popularity),
"Median Artist Popularity" = median(popularity),
"Mean Difference to Average Track Popularity" = mean(diff_to_mean),
"Max Difference to Average Track Popularity" = max(diff_to_mean))
library(knitr) #For printing table of Outlier Summary
## Warning: package 'knitr' was built under R version 4.3.2
# Convert the summary table to a nicely formatted table
kable(outlier_summary, "simple", align = 'c', caption = "Positive Outliers Summary")
| Total Positive Outliers | Unique Artists With Positive Outliers | Mean Artist Popularity | Median Artist Popularity | Mean Difference to Average Track Popularity | Max Difference to Average Track Popularity |
|---|---|---|---|---|---|
| 34 | 29 | 61.11765 | 62.5 | 14.69118 | 42.1 |
# Merge the dataframes based on artist_spotify
merged_df <- merge(audio_features_us, artist_info_ranking, by = "artist_spotify")
# Create a new column to identify relative_pop artists
merged_df$relative_pop <- ifelse(merged_df$relative_pop == "Popular", "Yes", "No")
# Load required libraries
library(ggplot2)
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.3.2
# Select relevant columns for comparison
comparison_cols <- c("danceability", "energy", "key", "loudness", "mode",
"speechiness", "acousticness", "instrumentalness",
"liveness", "valence", "tempo")
# Create side-by-side box plots in a grid with 3 columns
plots <- lapply(comparison_cols, function(col) {
ggplot(merged_df, aes(x = relative_pop, y = get(col), fill = relative_pop)) +
geom_boxplot() +
labs(title = paste(col),
x = "Popularity",
y = col) +
theme_minimal() +
theme(legend.position = "none") # Remove legend for clarity
})
# Arrange plots in a grid with 3 columns
grid_plots <- wrap_plots(plots, ncol = 3)
# Display the grid of plots
print(grid_plots)
artist_info_ranking <- readRDS("artist_info_ranking.rds")
top_tracks_us <- readRDS("top_tracks_us.rds")
top_tracks_gb <- readRDS("top_tracks_gb.rds")
audio_features_us <- readRDS("audio_features_us.rds")
audio_features_gb <- readRDS("audio_features_gb.rds")
#Defining the relative popularity
artist_info_ranking$relative_pop <- ifelse(artist_info_ranking$popularity <= mean(artist_info_ranking$popularity), "Not Popular", "Popular")
#Getting the tracks popularity along with the audio features
subset_top_track <- top_tracks_us %>% select(artist_id, track_id, track_name, track_popularity)
merged_df <- merge(audio_features_us, subset_top_track, by = c("track_id", "artist_id"))
subset_artist_pop <- artist_info_ranking %>% select(artist_id = id, artist_popularity = popularity, relative_pop)
merged_df <- merge(merged_df, subset_artist_pop, by = "artist_id")
# Create a new column to identify relative_pop artists
merged_df$relative_pop <- ifelse(merged_df$relative_pop == "Popular", "Yes", "No")
# Load required libraries
library(ggplot2)
# Select relevant columns for scatter plots
scatter_cols <- c("danceability", "energy", "key", "loudness", "mode",
"speechiness", "acousticness", "instrumentalness",
"liveness", "valence", "tempo")
# Create scatter plots in a grid with 3 columns
plots <- lapply(scatter_cols, function(col) {
ggplot(merged_df, aes(x = track_popularity, y = get(col), color = relative_pop)) +
geom_point(alpha = 0.5) +
labs(title = paste(col),
x = "Track Popularity",
y = col,
color = "Relative Popularity") +
theme_minimal()+
theme(legend.position = "bottom")
})
# Arrange plots in a grid with 3 columns
grid_plots <- wrap_plots(plots, ncol = 3)
# Display the grid of plots
print(grid_plots)
Introduction: A brief introduction to the research question and your approach to answering it. You do not need to cite any literature or write a literature review.
Data: A discussion of the data sources you used, how you accessed them, how you processed the data, the structure of your final analysis dataset(s), and so on.
Analysis: A presentation of your analysis, including figures/graphs/maps, and a discussion of your findings. In general, we do not expect you to conduct or interpret any formal statistical tests, though you may do this if you wish. Remember that your discussion should translate your specific analysis and results back to the level of the research question.
Code Appendix: As in the previous assignments, all code that you do not wish to directly include in your report should be included in a code appendix at the end of the document.
#Next steps # 1 - After I finish tidying the artist_id_pop data I add the id to the previous rs_ranking # Create a relational SQL databse to get the information using the spotify id as a key # 2 - Get top tracks for each artist # 3 - Get audio features for every track # 4 - Find similarities and graphs to show how the artists popularity and genres and audiofeatures relate # 5 - Get the top 50 global artists in Spotify # 6 - See how the top 50 audio features is related to the artists in the ranking # 7 - Argue that the popularity is a controversial thing and can change from country to country # 8 - The ranking is 100% anglophone (plot a map of the artists origin) # 9 - Get the end date of each band or artist from the other website to see if the end date is related to the popularity # 10 - Discuss the results